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(57) A hypertext document and anchor sentences of 
parent documents for the hypertext document are regis- 
tered with an hypertext document identifier as docu- 
ment information for each of hypertext documents 
having reference relationships with each other. A user 
can refer to one hypertext document according to an 
anchor sentence of another hypertext document func- 
tioning as a parent document. Also, occurrence posi- 
tions of one word in hypertext documents and parent 
documents are registered as word information for each 
of words. When a keyword is input, a plurality of partic- 
ular hypertext documents and particular parent docu- 
ments in which the keyword appears are specified 
according to the word information, one particular hyper- 
text document and corresponding particular parent doc- 
uments are unified to a unified hypertext document for 
each particular hypertext document, an occurrence fre- 
quency of the keyword in each unified hypertext docu- 
ment is calculated according to the document 
information, importance degrees of the unified hypertext 
documents are calculated as those of the particular 
hypertext documents according to the occurrence fre- 
quencies, and ranking of the particular hypertext docu- 
ments are determined according to those importance 
degrees. Because the occurrence frequency is calcu- 
lated by considering the parent documents, the particu- 
lar hypertext documents can be appropriately ranked. 
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Description 

BACKGROUND OF THE INVENTION 

1. FIELD OF THE INVENTION: 

The present invention relates generally to a hyper- 
text document retrieving apparatus, and more particu- 
larly to a hypertext document retrieving apparatus in 
which a plurality of hypertext documents likely to meet a 
user's retrieval request are retrieved from a large vol- 
ume of hypertext documents and are presented to the 
user. 

2. DESCRIPTION OF THE RELATED ART: 
2.1. PREVIOUSLY PROPOSED ART: 

As a conventional apparatus in which one or more 
documents likely to meet a user's retrieval request are 
retrieved from a large volume of documents and are 
presented to the user, a document retrieving apparatus 
200 shown in Fig. 1 is known. In this apparatus 200, a 
large volume of documents stored in a document man- 
aging unit 201 are analyzed in advance in a retrieval 
index developing unit 202, and it is examined how many 
times each of a plurality of words registered in a diction- 
ary of the retrieval index developing unit 202 appears in 
each of the documents. That is, an occurrence fre- 
quency of each word in one document is calculated for 
each of the documents stored in the document manag- 
ing unit 201, a deviation degree IDF of one word in the 
total documents is calculated as a correction factor for 
the word for each of the words, a normalized occurrence 
frequency (called a TF value) of each word is calculated 
for each of the documents, an estimated value of each 
document expressed by TF*IDF is calculated for each of 
the words by multiplying the deviation degree and the 
normalized occurrence frequency together, and a 
retrieval index is developed in the retrieval index devel- 
oping unit 202. In the retrieval index, a set of one word, 
identification data indicating one or more documents in 
which the word appears and one estimated value for the 
word is registered for each of the words. 

Thereafter, when a plurality of keywords input by a 
user 207 are received in a keyword input unit 203, the 
keywords are transmitted to a retrieving unit 204. In the 
retrieving unit 204, a plurality of retrieval words agreeing 
with the input keywords are found out from the retrieval 
index stored in the retrieval index developing unit 202, a 
particular set of one retrieval word, identification data 
indicating one or more retrieval documents in which the 
retrieval word appears and one estimated value for the 
retrieval word is taken out for each of the retrieval words 
from the retrieval index developing unit 202, and the 
particular sets corresponding to the keywords are trans- 
mitted to a document ranking determining unit 205. 

In the document ranking determining unit 205, a 
plurality of identification titles indicating the retrieval 



documents are arranged in decreasing order of the esti- 
mated values of the retrieval documents to determine 
the ranking of the retrieval documents, and the identifi- 
cation titles arranged according to the ranking of the 

5 retrieval documents are displayed as a retrieval result in 
a retrieval result displaying unit 206. Thereafter, when 
the user selects the identification titles displayed on the 
displaying unit 206 one after another in the arranged 
order, the retrieval document indicated by the selected 

10 identification title is read out from the document manag- 
ing unit 201 to the displaying unit 206 each time one 
identification title is selected, and the retrieval document 
is displayed on the retrieval result displaying unit 206 
each time one identification title is selected. 

15 Therefore, because the keywords according to a 
user's retrieval request are input by the user, a plurality 
of documents likely to meet the user's retrieval request 
can be presented in the order of the estimated value 
TF*IDF. 

20 A plurality of calculation methods of the estimated 
value TF'IDF are known. As an example of one calcula- 
tion method, the deviation degree IDF (= 1- log Nw/N) 
obtained by subtracting a logarithmic value (log Nw/N) 
of the ratio from 1 is defined. Here, the symbol Nw 

25 denotes the number of documents in which a remarked 
word appears, and the symbol N denotes the number of 
documents stored in the document managing unit 201. 
Also, the normalized occurrence frequency 
TF (=Fo/Nwd) obtained by dividing an occurrence fre- 

30 quency Fo of the remarked word in a remarked docu- 
ment by the number Nwd of words appearing in the 
remarked document is defined. In this case, the esti- 
mated value TF*IDF is calculated by multiplying the 
deviation degree and the normalized occurrence fre- 

35 quency together. 

The detail of the estimated value TF*IDF and a con- 
ventional document retrieving apparatus in which the 
estimated value TF*IDF is used are disclosed in a liter- 
ature "Saltan, Gerard: Introduction to modern Informa- 

40 tion Retrieval, McGraw-Hill computer science series, 
1983). 

2.2. PROBLEMS TO BE SOLVED BY THE INVEN- 
TION: 

45 

However, in cases where one or more particular 
hypertext documents likely to meet a user's retrieval 
request are retrieved from a large volume of hypertext 
documents by using the conventional document retriev- 

so ing apparatus, because the hypertext documents are 
not generally independent from each other but the 
hypertext documents often have reference relationships 
with each other, there is a drawback that the ranking of 
the particular hypertext documents likely to meet the 

55 user's retrieval request cannot be appropriately deter- 
mined. That is, because contents of a plurality of partic- 
ular hypertext documents having a referential 
relationship with each other are often connected with a 
consistent meaning, the contents of the particular 
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hypertext documents cannot be understood by reading 
only one particular hypertext document but be under- 
stood by reading all of the particular hypertext docu- 
ments. Therefore, in cases where one or more 
particular hypertext documents likely to meet a user's 
retrieval request are retrieved by using the conventional 
document retrieving apparatus, an importance degree 
of each particular hypertext document is erroneously 
estimated, so that there is a drawback that the ranking 
of the particular hypertext documents cannot be appro- 
priately determined. Also, even though the particular 
hypertext documents ranked according to their esti- 
mated values are displayed, because the ranking of the 
particular hypertext documents is not appropriately 
determined, there is another drawback that the user 
cannot smoothly select the particular hypertext docu- 
ments in an appropriate importance degree order. 

In particular, because a possibility that a plurality of 
hypertext documents written in a hypertext mark-up lan- 
guage (HTML) in a world wide web have a referential 
relationship with each other is considerably high, the 
ranking of the particular hypertext documents cannot be 
appropriately determined, and the user cannot 
smoothly select each of the particular hypertext docu- 
ments even though the particular hypertext documents 
ranked according to their estimated values are dis- 
played. 

SUMMARY OF THE INVENTION 

An object of the present invention is to provide, with 
due consideration to the drawbacks of such a conven- 
tional document retrieving apparatus, a hypertext docu- 
ment retrieving apparatus in which one or more 
hypertext documents likely to meet a user's retrieval 
request are retrieved from a large volume of hypertext 
documents and are appropriately ranked according to 
their importance degrees to smoothly select each of the 
hypertext documents even though the hypertext docu- 
ments are written in the hypertext mark-up language in 
the world wide web. 

To achieve the object of the present invention, in a 
hypertext document retrieving apparatus, a plurality of 
particular hypertext documents likely to meet a user's 
retrieval request are retrieved from a group of hypertext 
documents having reference relationships with each 
other in which one hypertext document having an 
anchor sentence functions as a parent document for 
another hypertext document functioning as a reference 
document and a user refers to one reference document 
after the user selects one anchor sentence of one par- 
ent document corresponding to the reference docu- 
ment. 

In detail, in hypertext document table preparing 
means, hypertext document information, in which one 
hypertext document identifier identifying one hypertext 
document, a body of the hypertext document, a parent 
document identifier identifying a parent document corre- 
sponding to the hypertext document functioning as one 



reference document and an anchor sentence of the par- 
ent document are registered, is prepared for each of the 
hypertext documents, and a hypertext document table 
of the hypertext document information for all hypertext 

5 documents is prepared in advance. 

Thereafter, in retrieval index preparing means, a 
plurality of words appearing in each of the hypertext 
documents and the parent documents are recognized 
according to the hypertext document table prepared by 

10 the hypertext document table preparing means, a plural- 
ity of occurrence positions of the words in each of the 
hypertext documents and the parent documents are 
recognized according to the hypertext document table, 
word information, composed of one or more occurrence 

is document identifiers identifying one or more hypertext 
documents in which one word appears and occurrence 
positions of the word in the hypertext documents and 
one or more anchor sentences of one or more parent 
documents corresponding to the hypertext documents, 

20 is prepared for each of the words, and a retrieval index 
of pieces of word information for the words is prepared 
in advance. 

Thereafter, when a keyword indicating the user's 
retrieval request is received in keyword receiving 

25 means, particular word information corresponding to the 
keyword is retrieved in retrieving means from the 
retrieval index prepared by the retrieval index preparing 
means. Also, a plurality of particular occurrence docu- 
ment identifiers identifying a plurality of particular hyper- 

30 text documents in which the keyword appears and a 
plurality of particular occurrence positions of the key- 
word in the particular hypertext documents and one or 
more particular anchor sentences of one or more partic- 
ular parent documents corresponding to the particular 

35 hypertext documents are retrieved from the particular 
word information. 

Thereafter, in document ranking determining 
means, the particular hypertext documents identified by 
the particular occurrence document identifiers are spec- 
ie ified, pieces of particular hypertext document informa- 
tion for the particular hypertext documents are retrieved 
from the hypertext document table prepared by the 
hypertext document table preparing means, one partic- 
ular hypertext document and one or more particular par- 

45 ent documents corresponding to the particular 
hypertext document are unified to a unified hypertext 
document for each of the particular hypertext docu- 
ments, an occurrence frequency of the keyword in one 
unified hypertext document is calculated for each uni- 

50 fied hypertext document, a plurality of importance 
degrees of the unified hypertext documents are deter- 
mined according to the occurrence frequencies in the 
unified hypertext documents, one importance degree of 
one unified hypertext document is set as an importance 

55 degree of one particular hypertext document corre- 
sponding to the unified hypertext document for each 
unified hypertext document, and the ranking of the par- 
ticular hypertext documents is determined according to 
the importance degrees of the unified hypertext docu- 
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merits. 

Thereafter, a plurality of indexes of the particular 
hypertext documents are displayed by retrieval result 
displaying means in a ranked order corresponding to 
the ranking of the particular hypertext documents as a s 
retrieval result. 

Because one unified hypertext document is pre- 
pared by unifying one particular hypertext document 
and one or more particular parent documents corre- 
sponding to the particular hypertext document for each w 
of the particular hypertext documents and one impor- 
tance degree of one unified hypertext document is cal- 
culated as one importance degree of one particular 
hypertext document corresponding to the unified hyper- 
text document for each of the unified hypertext docu- 15 
ments, the ranking of the particular hypertext 
documents can be determined by considering the par- 
ticular parent documents having the reference relation- 
ships with the particular hypertext documents. 
Therefore, even though contents of a plurality of specific 20 
hypertext documents having a referential relationship 
with each other are connected with a consistent mean- 
ing, the specific hypertext documents likely to meet the 
user's retrieval request can be correctly retrieved from a 
large volume of hypertext documents and be appropri- 25 
ately ranked according to their importance degrees, so 
that the user can smoothly select the specific hypertext 
documents in an appropriate importance degree order 
even though the specific hypertext documents are writ- 
ten in the hypertext mark-up language in the world wide 30 
web. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The objects, features and advantages of the 35 
present invention will be apparent from the following 
description taken in conjunction with the accompanying 
drawings, in which: 

Fig. 1 is a block diagram of a conventional docu- 40 
ment retrieving apparatus; 
Fig. 2 shows a reference relationship among a plu- 
rality of hypertext documents distributively man- 
aged in a world wide web of an internet; 
Fig. 3 is a block diagram of a hypertext retrieving 45 
apparatus according to a first embodiment of the 
present invention; 

Fig. 4 shows a hypertext document table of pieces 
of hypertext document information prepared in a 
hypertext document table with parent document list so 
preparing unit shown in Fig. 3; 
Fig. 5 shows a retrieval index of pieces of word 
information prepared in a retrieval index preparing 
unit shown in Fig. 3; 

Fig. 6 is a block diagram of a hypertext retrieving ss 
apparatus according to a second embodiment of 
the present invention; 

Fig. 7 shows an example of a retrieval result in 
which an index of one particular hypertext docu- 



ment is displayed with an index of a first-stage par- 
ticular parent document and an index of a second- 
stage particular parent document for each particu- 
lar hypertext document by a retrieval result display- 
ing unit shown in Fig. 6; 

Fig. 8 is a block diagram of a hypertext retrieving 
apparatus according to a third embodiment of the 
present invention; 

Fig. 9 shows an example of a retrieval result in 
which indexes of a plurality of particular hypertext 
documents are displayed with an index of a first- 
stage particular parent document and an index of a 
second-stage particular parent document by a 
retrieval result displaying unit shown in Fig. 8; 
Fig. 10 is a block diagram of a hypertext retrieving 
apparatus according to a fourth embodiment of the 
present invention; 

Fig. 11 is a block diagram of a hypertext retrieving 
apparatus according to a fifth embodiment of the 
present invention; 

Fig. 12 shows an example of a retrieval result in 
which an index of one particular hypertext docu- 
ment is displayed with a summary of the particular 
hypertext document, an index of a first-stage partic- 
ular parent document and an index of a second- 
stage particular parent document for each particu- 
lar hypertext document by a retrieval result display- 
ing unit shown in Fig. 1 1 ; 

Fig. 13 is a block diagram of a hypertext retrieving 
apparatus according to a sixth embodiment of the 
present invention; 

Fig. 14 is a block diagram of a hypertext retrieving 
apparatus according to a seventh embodiment of 
the present invention; 

Fig. 15 is a block diagram of a hypertext retrieving 
apparatus according to an eighth embodiment of 
the present invention; 

Fig. 16 is a block diagram of a hypertext retrieving 
apparatus according to a ninth embodiment of the 
present invention; 

Fig. 1 7 shows the division of a long hypertext docu- 
ment with one or more reference labels; 
Fig. 18 is a block diagram of a hypertext retrieving 
apparatus according to a tenth embodiment of the 
present invention; 

Fig. 19 shows an example of a retrieval result, in 
which indexes of hypertext documents and buttons 
corresponding to a plurality of high-ranking related 
words are displayed, according to the tenth embod- 
iment; 

Fig. 20 is a block diagram of a hypertext retrieving 
apparatus according to an eleventh embodiment of 
the present invention; and 
Fig. 21 shows an example of a retrieval result, in 
which indexes of hypertext documents and buttons 
corresponding to a plurality of high-ranking related 
words are displayed, according to the eleventh 
embodiment. 
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DETAILED DESCRIPTION OF THE EMBODIMENTS 

Preferred embodiments of a hypertext document 
retrieving apparatus, in which one or more particular 
hypertext documents likely to meet a user's retrieval s 
request are retrieved from a large volume of hypertext 
documents distributively managed in a world wide web 
of an internet are described with reference to drawings 
according to the concept of the present invention. 

Fig. 2 shows a reference relationship among a plu- 10 
ralrty of hypertext documents distributively managed in 
a world wide web of an internet. 

As shown in Fig. 2, a plurality of hypertext docu- 
ments D80 to D86 distributively managed in a world 
wide web of an internet have a referential relationship is 
with each other. That is, an anchor sentence S800 is 
placed in the hypertext document D80, an anchor sen- 
tence S801 is placed in the hypertext document D81 , an 
anchor sentence S802 is placed in the hypertext docu- 
ment D82, a plurality of anchor sentences S803 to S805 20 
are placed in the hypertext document D83, and an 
anchor sentence S806 is placed in the hypertext docu- 
ment D84. In each of the anchor sentences, either an 
identifier identifying a document to which a user can 
make reference or a position of a document to which a 25 
user can make reference is buried. 

A document to which a user makes reference is 
called a reference document in this specification, and a 
document having one anchor sentence which indicates 
one or more reference documents is called an parent 30 
document in this specification. Also, each anchor sen- 
tence is composed of one sentence or a plurality of sen- 
tences. 

Therefore, when a user reads the parent document 
D81 displayed on a display of a browsed document 35 
selecting means (called a browser) and points out a 
position of the anchor sentence S801 of the parent doc- 
ument D81 by using a so-called pointing device, the ref- 
erence document D83 is called and displayed, so that 
the user can efficiently use the distributed hypertext 40 
documents D80 to D86. 

A group of the hypertext documents D80 to D86 is 
written in a hypertext mark-up language, and each 
hypertext document is called a page, and a character 
string, an image or a program is written in each hyper- 45 
text document. For example, in cases where the parent 
document D81 is stored in a file named larmer.htmr, 
the reference document D83 is stored in a file named 
"apple. htmP and an indicator (or a document storing 
position) indicating a reference to the reference docu- so 
ment D83 is buried in a character string "apple produc- 
ing farmer" written in the parent document D81 to frame 
the anchor sentence S801. the anchor sentence S801 
is expressed by "< a href ="apple. html") apple produc- 
ing farmer (/a > ". In this case, because any sentence is 55 
not written in the reference document D83, there is a 
case that the document D82 is prepared in a computer 
placed far from another computer, in which the docu- 
ment D83 prepared before the preparation of the docu- 



ment D81 is stored, and the document D82 functions as 
an parent document for the reference document D83. 

(First Embodiment) 

Fig. 3 is a block diagram of a hypertext retrieving 
apparatus according to a first embodiment of the 
present invention. 

As shown in Fig. 3, a hypertext retrieving apparatus 
1 for retrieving one or more hypertext documents likely 
to meet a user's retrieval request from a large volume of 
hypertext documents stored in a hypertext document 
managing unit 8 in which the hypertext documents pre- 
pared in a large number of computers widely distributed 
in a network of a world wide web are distributively man- 
aged on condition that the hypertext documents have 
reference relationships with each other, comprises 

a hypertext document table with parent document 
list preparing unit 7 for analyzing the hypertext doc- 
uments having the reference relationships which 
are managed by the hypertext document managing 
unit 8, preparing hypertext document information in 
which one or more parent document identifiers 
identifying one or more parent documents and 
anchor sentences of the parent documents are 
listed with one hypertext document identifier identi- 
fying one hypertext document and a document stor- 
ing position of the hypertext document, for each of 
the hypertext documents, and preparing a hyper- 
text document table of the hypertext document 
information for all hypertext documents managed 
by the hypertext document managing unit 8, a 
retrieval index preparing unit 6 having a dictionary 
for analyzing a body of one hypertext document, a 
title of the hypertext document and character 
strings of one or more anchor sentences of one or 
more parent documents corresponding to the 
hypertext document in advance for each of the 
hypertext documents managed by the hypertext 
document managing unit 8 according to the hyper- 
text document table prepared by the hypertext doc- 
ument table with parent document list preparing 
unit 7 to recognize a plurality of words appearing in 
the hypertext documents, preparing a piece of word 
information for one word in which one occurrence 
document identifier identifying one hypertext docu- 
ment, in which the word registered in the dictionary 
appears, and positional information indicating 
occurrence positions of the word in the title of the 
hypertext document, the body of the hypertext doc- 
ument and the anchor sentences of the parent doc- 
uments corresponding to the hypertext document 
are listed for each of the hypertext documents, and 
preparing a retrieval index of pieces of word infor- 
mation for the words stored in the dictionary, 
a keyword input unit 2 for receiving a plurality of 
keywords input by a user 9, 
a retrieving unit 3 for retrieving a plurality of pieces 
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of particular word information corresponding to a 
plurality of particular words agreeing with the key- 
words received in the keyword input unrt 2 from the 
retrieval index prepared in the retrieval index pre- 
paring unit 6 and retrieving particular occurrence 5 
document identifiers identifying particular hypertext 
documents, in which one particular word agreeing 
with one keyword appears, and particular positional 
information indicating particular occurrence posi- 
tions of one particular word in the particular hyper- 
text documents and a plurality of particular parent 
documents corresponding to the particular hyper- 
text documents from the particular word information 
for each of the particular words, 
a document ranking determining unit 4 for unifying 
one particular hypertext document and one or more 
particular parent documents corresponding to the 
particular hypertext document to a unified particular 
hypertext document according to the document 
information of the hypertext document table pre- 
pared by the hypertext document table with parent 
document list preparing unit 7 for each of the partic- 
ular hypertext documents obtained in the retrieving 
unit 3, calculating an occurrence frequency TF of 
one particular word in one unified particular hyper- 
text document for each particular word and each 
unified particular hypertext document, calculating 
an inverse document frequency IDF defined as an 
inverse value of the number of particular hypertext 
documents, in which one particular word appears, 
for each particular word, calculating a product 
TF*IDF of one occurrence frequency TF and one 
inverse document frequency IDF, summing a plural- 
ity of products for all particular words to produce a 
summed product as an estimated value for each 
unified particular hypertext document, determining 
a plurality of importance degrees of the unified par- 
ticular hypertext documents according to the esti- 
mated values, determining the ranking of the 
particular hypertext documents according to the 
importance degrees for the unified particular hyper- 
text documents and preparing an index of one par- 
ticular hypertext document for each of the particular 
hypertext documents, and 
a retrieval result displaying unit 5 for displaying the 
indexes of the particular hypertext documents in the 
ranked order determined in the document ranking 
determining unit 4 as a retrieval result. 

in the above configuration, an operation of the 
hypertext retrieving apparatus 1 is described. A plurality 
of hypertext documents having reference relationships 
with each other are prepared in a large number of com- 
puters widely distributed in a network of a world wide 
web. In the hypertext document managing unit 8, the 
hypertext documents are distributively managed. The 
reference document table with parent document prepar- 
ing unit 7 has a related document collecting function 
(generally called a web robot). Therefore, when a plural- 



ity of document storing position addresses (generally 
called a plurality of universal resource locators) of a plu- 
rality of hypertext documents are given to the reference 
document table with parent document preparing unit 7, 
the plurality of hypertext documents are indicated as a 
plurality of parent documents by the universal resource 
locator one after another, one or more anchor sen- 
tences written in each of the parent documents are ana- 
lyzed, and one or more reference documents are 
collected for each of the parent documents. Thereafter, 
a plurality of hypertext document identifiers not over- 
lapped with each other are allocated to the collected ref- 
erence documents in the order of collection to identify 
the collected reference documents. In this case, when 
any image or program is not written in each of the col- 
lected reference documents and a character string is 
written in each of the collected reference documents, a 
collecting time can be saved. Also, a plurality of docu- 
ment storing position addresses of the collected refer- 
ence documents are listed to prohibit that one collected 
reference document listed is again collected. Therefore, 
as shown in Fig. 2, though not only the parent document 
D83 relates to the reference document D84 according 
to the anchor sentence S803 but also the parent docu- 
ment D84 relates to the reference document D83 
according to the anchor sentence S806, it is prohibited 
that the hypertext documents D83 and D84 are col- 
lected twice. 

Thereafter, a hypertext document table of pieces of 
hypertext document information (refer to Fig. 4) in which 
parent document identifiers of one or more parent doc- 
uments and anchor sentences of the parent documents 
are listed for each hypertext document is prepared in 
the hypertext document table with parent document list 
preparing unit 7 according to a following procedure. A 
plurality of document information entry spaces DS1 to 
DS3 of which the number is equal to the number of col- 
lected reference documents are prepared. In each of 
the document information entry spaces, the number of 
one hypertext document identifier identifying one col- 
lected reference document and one document storing 
position address of the collected reference document 
are written in the document information entry space. 
Thereafter, a title of the collected reference document is 
extracted from the collected reference document by 
examining a plurality of character strings written in the 
collected reference document. In this embodiment, a 
title "apple that I grew" is, for example, extracted from a 
character string "(title) apple that I grew (title)", and the 
title is written in the document information entry space. 
Thereafter, one or more character strings of hypertext 
mark-up language tags respectively denoting a charac- 
ter string placed between "<" and ">" are removed from 
a plurality of character strings existing in a body of the 
collected reference document to form a text body, and 
the text body is written in the document information 
entry space. Thereafter, it is checked whether or not 
one or more anchor sentences relating to one reference 
document exist in one or more parent documents relat- 
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ing to the reference document. In cases where an 
anchor sentence exists in an parent document relating 
to one reference document, a set of an parent docu- 
ment identifier identifying the parent document and the 
anchor sentence of the parent document is written in 5 
the document information entry space to form an parent 
document list for each hypertext document information. 
Also, a plurality of words used in the text body, the title 
and the anchor sentences are written in the document 
information entry space to form a word list for each 10 
hypertext document information. 

Therefore, in the reference document table with 
parent document preparing unit 7, as shown in Fig. 3. a 
document information entry space is prepared for each 
of the hypertext documents managed by the hypertext 15 
document managing unit 8. a hypertext document iden- 
tifier, a document storing position, a title, a text body, an 
parent document list and a word list are written in each 
of the document information entry spaces to prepare a 
hypertext document table. 20 

In this embodiment, the hypertext document table is 
prepared after one or more anchor sentences written in 
each of the parent documents are analyzed to collect 
the reference documents. Therefore, the anchor sen- 
tences are analyzed or checked twice to determine the 25 
collected reference documents and prepare the hyper- 
text document table. However, in cases where the 
hypertext document table is prepared while analyzing 
the anchor sentences to collect the reference docu- 
ments, the hypertext document table can be efficiently 30 
prepared. 

Thereafter, in the retrieval index preparing unit 6 
having a dictionary, a body of a hypertext document, a 
title of the hypertext document and character strings of 
one or more anchor sentences of the hypertext docu- 35 
ment are analyzed in advance for each of the hypertext 
documents of the hypertext document table, a piece of 
word information composed of a word, one or more 
occurrence document identifiers identifying hypertext 
documents, in which the word appears, and positional 40 
information indicating occurrence positions of the word 
in the hypertext documents is prepared for each of a 
plurality of words stored in the dictionary, and a retrieval 
index of pieces of word information for the plurality of 
words is prepared as shown in Fig. 5. 45 

In detail, tens of thousands words are registered in 
the dictionary of the retrieval index preparing unit 6, and 
a plurality of word information entry spaces WS1 to 
WS3, of which the number is equal to the number of 
words registered in the dictionary, are prepared, and so 
each of the words is written in one of the word informa- 
tion entry spaces WS1 to WS3. Thereafter, a word reg- 
istered in the word list of one document information 
entry space of the hypertext document table is detected 
as a particular word, a hypertext document identifier of 55 
a particular hypertext document corresponding to the 
document information entry space is detected as an 
occurrence hypertext document identifier, one or more 
positions of the particular word in the particular hyper- 



text document are detected as positional information, 
and a set of the occurrence hypertext document identi- 
fier and the positional information is written as word 
information in a particular word information entry space 
corresponding to the particular word. This processing is 
performed for each of the words registered in the word 
lists of all document information entry spaces of the 
hypertext document table, so that a retrieval index of the 
pieces of word information corresponding to a plurality 
of words used in the hypertext documents is prepared. 

Fig. 5 shows a piece of word information of the 
retrieval index which is written in the word information 
entry space WS1 and corresponds to a word "apple". 
"(Title, 1)" indicates that the word "apple" appears in the 
first word position of the title of the hypertext document 
D83, "(Body,4,33,43)" indicates that the word "apple" 
appears in the fourth, 33-th and 43-th word positions of 
the body of the hypertext document D83. "(000081,1)" 
indicates that the word "apple" appears in the first word 
position of the anchor sentence S801 of the hypertext 
document D81 functioning as the parent document, and 
"(000082,4)" indicates that the word "apple" appears in 
the fourth word position of the anchor sentence S802 of 
the hypertext document D82 functioning as the parent 
document. 

Also, it is applicable that an inverse value of the 
number of occurrence documents in which a word 
appears (generally called an inverse document fre- 
quency IDF) and the occurrence frequency of the word 
in each of the occurrence documents (generally called a 
text frequency TF) be calculated in advance in the 
retrieval index preparing unit 6 and written in a corre- 
sponding word information entry space for each of the 
words. Therefore, a processing time required for the 
retrieval can be shortened. 

Therefore, in the retrieval index preparing unit 6, 
each of the words appearing in the text body of the 
hypertext document, the title of the hypertext document 
and the anchor sentences of the parent documents 
relating to the hypertext document is analyzed, and an 
occurrence document list composed of one or more 
occurrence document identifiers and the positional 
information is prepared for each word. Accordingly, a 
retrieval index in which word appearing positions in 
each of the hypertext documents are indicated for each 
word can be prepared. 

The keyword input unit 2 has a function of a text box 
and a retrieval starting button for returning contents of 
the text box, and an HTML document written according 
to the hypertext mark-up language having a title such as 
"retrieval page" is employed for the keyword input unit 2. 
That is, the user 9 calls the HTML document in the world 
wide web browser such as Mosaic or Netscape oper- 
ated in his own client computer, a single keyword is 
input to the text box or a plurality of keywords divided by 
spaces are input to the text box, and the retrieval start- 
ing button is pushed. Therefore, the single keyword or 
keywords are input. 

Therefore, a plurality of keywords input by the user 



7 



13 



EP 0 809 197 A2 



14 



9 are received in the keyword input unit 2 and are trans- 
mitted to the retrieving unit 3. In this embodiment, the 
user inputs each of the keywords by pushing a plurality 
of keys arranged on a keyboard. However, in cases 
where each of a plurality of candidates for a keyword is 
selected by pushing a button, a keyword input operation 
using the pointing device can be easily performed with- 
out using any keyboard even though an unskilled per- 
son operates the keyword input unit 2. 

In the retrieving unit 3, pieces of particular word 
information corresponding to a plurality of particular 
words, which agree with the keywords input by the key- 
word input unit 2, are extracted from the retrieval index 
stored in the retrieval index preparing unit 6, and one or 
more occurrence document identifiers identifying one or 
more particular hypertext documents, in which one par- 
ticular word agreeing with one keyword appears, and 
positional information indicating positions of the particu- 
lar word in the particular hypertext documents are 
obtained from one piece of word information for each of 
the particular words. A plurality of sets of the occurrence 
document identifiers and the positional information are 
transmitted to the document ranking determining unit 4. 

In the document ranking determining unit 4, pieces 
of hypertext document information corresponding to the 
particular hypertext documents identified by the occur- 
rence document identifiers are extracted from the 
hypertext document table, and one particular hypertext 
document and one or more parent documents identified 
by one or more parent document identifiers listed in one 
piece of hypertext document information corresponding 
to the particular hypertext document are unified to an 
unified particular hypertext document. The unified par- 
ticular hypertext document is formed for each of the par- 
ticular hypertext documents which are identified by the 
occurrence document identifiers transmitted from the 
retrieving unit 3. Thereafter, an inverse document fre- 
quency IDF defined as an inverse value of the number 
of unified particular hypertext documents in which one 
particular word agreeing with one keyword appears and 
the occurrence frequency TF of one particular word in 
each of the unified particular hypertext documents are 
calculated for each of the particular words according to 
the plurality of sets of the occurrence document identifi- 
ers and the positional information. The inverse docu- 
ment frequency IDF denotes a correction factor for each 
particular word. 

Thereafter, in cases where one keyword is only 
input, an estimated value obtained by multiplying the 
inverse document frequency IDF for one particular word 
and the occurrence frequency TF together is calculated 
as an importance degree for each of the unified particu- 
lar hypertext documents. Also, in cases where the 
number of keywords input by the user is two or more, a 
product TF*IDF of one occurrence frequency TF and 
one inverse document frequency IDF is calculated for 
each keyword and each unified particular hypertext doc- 
ument, a sum of the products calculated for all keywords 
is adopted as an estimated value for each of the unified 



particular hypertext documents, and an importance 
degree for each of the unified particular hypertext docu- 
ments is determined according to the estimated values. 
The importance degree for each unified particular 

5 hypertext document is set as an importance degree for 
one particular hypertext document corresponding to the 
unified particular hypertext document. Thereafter, the 
ranking of the particular hypertext documents including 
the parent documents is determined according to the 

w importance degrees of the particular hypertext docu- 
ments. 

In cases where the number of keywords is two or 
more, it is applicable that an estimated value for one 
particular hypertext document be set to a value N times 

15 (N is two or more) as high as a sum of the products 
TF*IDF calculated for all keywords when N particular 
words agreeing with N keywords appear in the particu- 
lar hypertext document. In this case, because the corre- 
lation among the N keywords is reflected on the 

20 importance degree for each particular hypertext docu- 
ment, the user's retrieval request can be moreover sat- 
isfied. 

Also, in cases where two particular words agreeing 
with two keywords are used in one particular hypertext 

25 document close to each other within 20 characters, it is 
applicable that an estimated value for the unified partic- 
ular hypertext document be doubled. In this case, 
because the correlation between the two keywords 
dose to each other is reflected on the importance 

30 degree for each particular hypertext document, the 
user's retrieval request can be moreover satisfied. 

Thereafter, in the document ranking determining 
unit 4, an HTML document, in which a plurality of 
indexes of the particular hypertext documents are listed 

35 in the ranked order, is prepared and transmitted to the 
retrieval result displaying unit 5. In this case, the index of 
one particular hypertext document is a title of the partic- 
ular hypertext document or a character string of an 
anchor sentence written in one of the parent docu- 

40 ments, and a document storing position address indicat- 
ing a position of the particular hypertext document in the 
hypertext document managing unit 8 is buried in the 
index of the particular hypertext document, and the 
index functions as an anchor sentence. That is, when 

45 the user selects one index of one particular hypertext 
document, the particular hypertext document is called 
from the hypertext document managing unit 8 according 
to the document storing position address. 

Therefore, in the document ranking determining 

so unit 4, one or more parent documents having a refer- 
ence relationship with one particular hypertext docu- 
ment are extracted from the hypertext document table 
prepared in the reference document table with parent 
document preparing unit 7 for each particular hypertext 

55 document, one particular hypertext document and one 
or more parent documents having a reference relation- 
ship with the particular hypertext document are unified 
to a unified particular hypertext document for each par- 
ticular hypertext document, an importance degree of 
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the particular hypertext document including the parent 
documents is determined according to an estimated 
value TF*IDFN for each particular hypertext document, 
the particular hypertext documents are ranked accord- 
ing to the those importance degrees, and the particular 5 
hypertext documents are listed in the ranked order. 

In this embodiment, the occurrence frequency TF of 
the word is not normalized because the occurrence fre- 
quency TF is not divided by a size of one unified partic- 
ular hypertext document. However, in cases where the 10 
occurrence frequency TF of the word is normalized by 
dividing the occurrence frequency TF by a size of one 
unified particular hypertext document, it is required that 
a size of each hypertext document is written in the 
hypertext document table. 15 

The retrieval result displaying unit 5 is embodied by 
the world wide web browser such as Mosaic or Net- 
scape operated in his own client computer. The HTML 
document prepared in the document ranking determin- 
ing unit 4 is displayed on a display of the client compu- 20 
ten Thereafter, when the user selects one index of one 
particular hypertext document tabled in the HTML docu- 
ment by using a pointing device, a position of the partic- 
ular hypertext document selected by the user is 
ascertained according to the document storing position 25 
address buried in the index of the particular hypertext 
document, and the particular hypertext document is 
called from the hypertext document managing unit 8. 

Therefore, in the retrieval result displaying unit 5, 
the indexes of the particular hypertext documents listed 30 
in the HTML document are displayed, the user selects 
one index of one particular hypertext document, and the 
particular hypertext document selected by the user is 
called from the hypertext document managing unit 8. 

Accordingly, because one or more parent docu- 35 
ments having a reference relationship with each refer- 
ence document are listed in the hypertext document 
table prepared by the reference document table with 
parent document preparing unit 7, the parent docu- 
ments corresponding to one reference document can 40 
be specified by extracting the document information cor- 
responding to the reference document from the hyper- 
text document table. Therefore, because it is not 
required to ask the hypertext document managing unit 8 
for one or more parent documents corresponding to the 45 
reference document, one or more parent documents 
corresponding to each reference document can be 
quickly ascertained. 

Also, because one particular hypertext document 
and one or more parent documents having a reference so 
relationship with the particular hypertext document are 
unified as an unified particular hypertext document in 
the document ranking determining unit 4, an importance 
degree can be determined for each of the unified partic- 
ular hypertext documents. Therefore, the ranking of the 55 
particular hypertext documents in which one particular 
word agreeing with one keyword appears can be deter- 
mined according to the importance degrees while con- 
sidering the parent documents corresponding to each 



particular hypertext document. Accordingly, the indexes 
of the particular hypertext documents can be displayed 
by the retrieval result displaying unit 5 according to the 
ranking of the particular hypertext documents on condi- 
tion that the user's retrieval request expressed by the 
keyword is reliably satisfied, and the user can selects 
the particular hypertext documents in the ranked order. 

Also, because one hypertext document and one or 
more anchor sentences of one or more parent docu- 
ments having a reference relationship with the hypertext 
document are listed in each piece of document informa- 
tion of the hypertext document table prepared by the ref- 
erence document table with parent document preparing 
unit 7, each piece of word information of the retrieval 
index indicating that a word appears in one hypertext 
document and one or more anchor sentences of one or 
more parent documents having a reference relationship 
with the hypertext document can be easily prepared in 
the retrieval index preparing unit 6. In addition, because 
one or more parent documents having a reference rela- 
tionship with each reference document are listed in the 
hypertext document table prepared by the reference 
document table with parent document preparing unit 7, 
when the retrieval index is prepared in the retrieval 
index preparing unit 6, it is not required to ask the hyper- 
text document managing unit 8 for one or more parent 
documents corresponding to the reference document. 
Therefore, the retrieval index can be quickly prepared. 

(Second Embodiment) 

Rg. 6 is a block diagram of a hypertext retrieving 
apparatus according to a second embodiment of the 
present invention. 

As shown in Fig. 6, a hypertext retrieving apparatus 
1 1 for retrieving one or more hypertext documents likely 
to meet a user's retrieval request from a large volume of 
hypertext documents stored in the hypertext document 
managing unit 8, comprises the hypertext document 
table with parent document list preparing unit 7, the 
retrieval index preparing unit 6, the keyword input unit 2, 
the retrieving unit 3, 

a document ranking determining unit 1 2 for unifying 
one particular hypertext document and one or more 
particular parent documents corresponding to the 
particular hypertext document to a unified particular 
hypertext document according to the document 
information of the hypertext document table pre- 
pared by the hypertext document table with parent 
document list preparing unit 7 for each of the partic- 
ular hypertext documents obtained in the retrieving 
unit 3, calculating estimated values for the unified 
particular hypertext documents according to the 
particular word information of the retrieval index 
obtained in the retrieval index preparing unit 6, 
determining a plurality of importance degrees of the 
unified particular hypertext documents according to 
the estimated values, determining the ranking of 
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the particular hypertext documents according to the 
importance degrees for the unified particular hyper- 
text documents and preparing an index of one par- 
ticular hypertext document with an index of a 
particular parent document corresponding to the 5 
particular hypertext document for each of the par- 
ticular hypertext documents, and 
a retrieval result displaying unit 13 for displaying the 
index of the particular hypertext document with the 
index of the particular parent document for each of 10 
the unified particular hypertext documents in the 
ranked order determined in the document ranking 
determining unit 12 as a retrieval result. 

In the above configuration, after the ranking of the 15 
particular hypertext documents is determined according 
to the importance degrees in the document ranking 
determining unit 12 in the same manner as in the first 
embodiment, not only an index of one particular hyper- 
text document but also an index of a particular parent 20 
document corresponding to the particular hypertext 
document are prepared for each of the particular hyper* 
text documents. In cases where a plurality of parent 
documents corresponding to the particular hypertext 
document exist, one parent document of which the doc- 25 
ument storing position is closest to that of the particular 
hypertext document among those of the parent docu- 
ments is selected as the particular parent document. 
This selection is performed by comparing a portion of a 
character string indicating the document storing posi- 30 
Won of each parent document with a portion of a charac- 
ter string indicating the document storing position of the 
particular hypertext document. Also, in this embodi- 
ment, the particular parent document (or a first-stage 
particular parent document) is regarded as a second- 35 
stage reference document, a second-stage particular 
parent document having a reference relationship with 
the second-stage reference document is specified, and 
an index of the second-stage particular parent docu- 
ment is prepared. Thereafter, the index of one particular 40 
hypertext document is displayed with the index of the 
first-stage particular parent document and the index of 
the second-stage particular parent document for each 
particular hypertext document by the retrieval result dis- 
playing unit 13. 45 

Fig. 7 shows an example of the index of one partic- 
ular hypertext document displayed with the index of the 
first-stage particular parent document and the index of 
the second-stage particular parent document for each 
particular hypertext document by the retrieval result dis- so 
playing unit 13. 

As shown in Fig. 7, in cases where the fourth rank 
is given to the hypertext document D83, the 18-th rank 
is given to the hypertext document D85 and the 19-th 
rank is given to the hypertext document D86. the index 55 
of the particular hypertext document D83 is displayed 
with the index of the first-stage particular parent docu- 
ment D81 and the index of the second-stage particular 
parent document D80 as a fourth ranking group, the 



index of the particular hypertext document D85 is dis- 
played with the index of the first-stage particular parent 
document D83 and the index of the second-stage par- 
ticular parent document D81 as a 18-th ranking group, 
and the index of the particular hypertext document D86 
is displayed with the index of the first-stage particular 
parent document D83 and the index of the second- 
stage particular parent document D81 as a 19-th rank- 
ing group. 

Accordingly, even though the hypertext document 
D86 having no anchor sentence is selected as one par- 
ticular hypertext document, the hypertext document 
D83 or D81 having a close relation with the hypertext 
document D86 can be easily selected and called from 
the hypertext document managing unit 8 without relying 
on any anchor sentence. That is, because a plurality of 
hypertext documents having a reference relationship 
with each other closely relate to each other, the display 
of the indexes of the first-stage and second-stage par- 
ticular parent document is very useful for the user. 

(Third Embodiment) 

In the first or second embodiment, in cases where 
the hypertext document D83 of the fourth rank is called 
and read, the hypertext document D85 is called and 
read by selecting the position of the anchor sentence 
S804 and a plurality of hypertext documents of lower 
ranks following the fourth rank are called and read one 
by one, there is a probability that the hypertext docu- 
ment D85 of the 18-th rank is erroneously called and 
read again because the user forgets the reading of the 
hypertext document D85 though the hypertext docu- 
ment D85 has been already read. Also, even though the 
hypertext document D86 of the 19-th rank is called and 
read, because a long time elapses after the hypertext 
document D83 of the fourth rank is called and read, 
there is a probability that the user cannot understand 
context of the hypertext document D86 closely relating 
to context of the hypertext document D83. Therefore, to 
solve the above drawbacks in the third embodiment, the 
ranks given to a plurality of hypertext documents closely 
relating to each other are set to the same rank. 

Fig. 8 is a block diagram of a hypertext retrieving 
apparatus according to a third embodiment of the 
present invention. 

As shown in Fig. 8, a hypertext retrieving apparatus 
21 for retrieving one or more hypertext documents likely 
to meet a user's retrieval request from a large volume of 
hypertext documents stored in the hypertext document 
managing unit 8, comprises the hypertext document 
table with parent document list preparing unit 7, the 
retrieval index preparing unit 6, the keyword input unit 2, 
the retrieving unit 3, 

a document ranking determining unit 22 for unifying 
one particular hypertext document and one or more 
particular parent documents corresponding to the 
particular hypertext document to a unified particular 
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hypertext document according to the document 
information of the hypertext document table pre- 
pared by the hypertext document table with parent 
document list preparing unit 7 for each of the partic- 
ular hypertext documents obtained in the retrieving 
unit 3, calculating estimated values for the unified 
particular hypertext documents according to the 
particular word information of the retrieval index 
obtained in the retrieval index preparing unit 6, 
determining a plurality of importance degrees of the 
unified particular hypertext documents according to 
the estimated values, determining the ranking of 
the particular hypertext documents according to the 
importance degrees for the unified particular hyper- 
text documents on condition that ranks given to two 
or more particular hypertext documents closely 
relating to each other are set to the same rank and 
preparing an index of one particular hypertext doc- 
ument for each of the particular hypertext docu- 
ments, and 

a retrieval result displaying unit 23 for displaying the 
indexes of the particular hypertext documents in the 
ranked order determined in the document ranking 
determining unit 22 as a retrieval result on condition 
that two or more particular hypertext documents set 
to the same rank are displayed with one or more 
particular parent documents corresponding to any 
of the particular hypertext documents in common in 
a group. 

In the above configuration, after the importance 
degrees of the particular hypertext documents are cal- 
culated and the ranking of the particular hypertext doc- 
uments is determined according to the importance 
degrees in the document ranking determining unit 22 in 
the same manner as in the first embodiment, one or 
more parent document identifiers listed in one piece of 
document information of the hypertext document table 
corresponding to one particular hypertext document are 
extracted, and one or more parent documents identified 
by the parent document identifiers are specified for each 
particular hypertext document. Thereafter, it is judged 
whether or not each of the parent documents agrees 
with one of the particular hypertext documents. In cases 
where one parent document corresponding to a first 
particular hypertext document of a rank A agrees with a 
second particular hypertext document of a rank 8, it is 
judged that the first and second particular hypertext 
documents closely relate to each other, and the first and 
second particular hypertext documents are reset to a 
higher rank between the ranks A and B. Thereafter, 
indexes of the particular hypertext documents are dis- 
played in the ranked order by the retrieval result display- 
ing unit 23. 

For example, because the parent document D83 
corresponding to the hypertext document D85 of the 18- 
th rank agrees with the hypertext document D83 of the 
fourth rank, the hypertext document D85 is reset to the 
fourth rank. Also, because the parent document D83 



corresponding to the hypertext document D86 of the 1 9- 
th rank agrees with the hypertext document D83 of the 
fourth rank, the hypertext document D86 is reset to the 
fourth rank. 

5 Therefore, because a plurality of particular hyper- 
text documents closely relate to each other are set to 
the same rank and are displayed close to each other, 
the user can consecutively read the particular hypertext 
documents closely relate to each other, so that the user 

10 can easily realize the contexts of the particular hyper- 
text documents. Accordingly, it is prevented that the 
same particular hypertext document is erroneously read 
again, and the user can efficiently read a group of par- 
ticular hypertext documents closely relate to each other. 

is In this embodiment, a plurality of particular hyper- 
text documents closely relate to each other are set to 
the highest rank among the ranks given to the plurality 
of particular hypertext documents. However, the third 
embodiment is not limited to this concept. That is, when 

20 a plurality of particular hypertext documents closely 
relate to each other are determined, it is applicable that 
a sum of the importance degrees of the particular 
hypertext documents be calculated and the particular 
hypertext documents be reset to the same higher rank 

25 according to the summed importance degree. 

Also, it is preferred that the concept of the second 
embodiment and the concept of the third embodiment 
be combined. For example, as shown in Fig. 7, when a 
first group of the particular hypertext document D83 and 

30 the parent documents D80 and D81 is set to the fourth 
rank, a second group of the particular hypertext docu- 
ment D85 and the parent documents D81 and D83 is 
set to the 18-th rank and a third group of the particular 
hypertext document D86 and the parent documents 

35 D81 and D83 is set to the 19-th rank according to the 
second embodiment, the second group of documents 
D81, D83 and D85 set to the 18-th rank is reset to the 
fourth rank, and the third group of documents D81 , D83 
and D86 set to the 19-th rank is reset to the fourth rank, 

40 and a combined group of the particular hypertext docu- 
ments D83, D85 and D86 and the parent documents 
D80 and D81 reset to the fourth rank is displayed as 
shown in Fig. 9. 

45 (Fourth Embodiment) 

In general, a special word indicating a feature of a 
reference document appears many times in one or more 
anchor sentences of one or more parent documents 

so corresponding to the reference document. Therefore, in 
cases where an estimated value for the reference docu- 
ment is calculating by considering the special word 
appearing in the anchor sentences of the parent docu- 
ment and the reference document is ranked according 

55 to the estimated value, reliability for the retrieval of a plu- 
rality of hypertext documents likely to meet a user's 
retrieval request can be improved. 

Fig. 10 is a block diagram of a hypertext retrieving 
apparatus according to a fourth embodiment of the 
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present invention. 

As shown in Fig. 10, a hypertext retrieving appara- 
tus 31 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request from a large vol- 
ume of hypertext documents stored in the hypertext s 
document managing unit 8. comprises the hypertext 
document table with parent document list preparing unit 
7, the retrieval index preparing unit 6, the keyword input 
unit 2, the retrieving unit 3, 

10 

a document ranking determining unit 32 for calcu- 
lating an occurrence frequency of each particular 
word in one particular hypertext document and one 
or more anchor sentences of one or more particular 
parent documents corresponding to the particular is 
hypertext document as a revised occurrence fre- 
quency TF for the particular hypertext document for 
each of the particular hypertext documents accord- 
ing to the particular word information of the retrieval 
index obtained in the retrieval index preparing unit 20 
6, calculating estimated values of the particular 
hypertext documents according to the revised 
occurrence frequencies TF and inverse document 
frequencies IDF, determining a plurality of impor- 
tance degrees of the particular hypertext docu- 25 
ments according to the estimated values, 
determining the ranking of the particular hypertext 
documents according to the importance degrees 
and preparing indexes of the particular hypertext 
documents, and 30 
a retrieval result displaying unit 33 for displaying the 
indexes of the particular hypertext documents in the 
ranked order determined in the document ranking 
determining unit 22 as a retrieval result. 

35 

In the above configuration, in cases where the user 
input a keyword "apple", as shown in Fig. 4, the particu- 
lar word "apple" appears four times in the title of the 
hypertext document D83 and the body of the hypertext 
document D83. Also, the particular word "apple" 40 
appears in the anchor sentence S801 of the parent doc- 
ument D81 and the anchor sentence S802 of the parent 
document D82. Therefore, because a sum of an occur- 
rence frequency of the particular word "apple" in the 
hypertext document D83 and the anchor sentences 45 
S801 and S802 of the parent documents D81 and D82 
is 6, a revised occurrence frequency TF for the particu- 
lar hypertext document D83 is set to 6, and an esti- 
mated value of the particular hypertext document D83 is 
calculated by using the revised occurrence frequency so 
TF in the document ranking determining unit 32. 
Accordingly, the particular hypertext document D83 is 
ranked to a higher rank, so that reliability of the retrieval 
of the particular hypertext document D83 can be 
improved. 55 

(Fifth Embodiment) 

In the first to fourth embodiments, in cases where 



the user desires to know an outline of contents of one 
particular hypertext document when an index of the par- 
ticular hypertext document is displayed, it is required to 
call the particular hypertext document from the hyper- 
text document managing unit 8. Therefore, in cases 
where the user desires to read contents of many partic- 
ular hypertext documents, it is troublesome that the user 
call the particular hypertext documents. 

Fig. 1 1 is a block diagram of a hypertext retrieving 
apparatus according to a fifth embodiment of the 
present invention. 

As shown in Fig. 1 1 , a hypertext retrieving appara- 
tus 41 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request from a large vol- 
ume of hypertext documents stored in the hypertext 
document managing unit 8. comprises the hypertext 
document table with parent document list preparing unit 
7, the retrieval index preparing unit 6, the keyword input 
unit 2, the retrieving unit 3, 

a document ranking determining unit 42 for unifying 
one particular hypertext document and one or more 
particular parent documents corresponding to the 
particular hypertext document to a unified particular 
hypertext document according to the document 
information of the hypertext document table pre- 
pared by the hypertext document table with parent 
document list preparing unit 7 for each of the partic- 
ular hypertext documents obtained in the retrieving 
unit 3, calculating estimated values for the unified 
particular hypertext documents for each particular 
word according to the particular word information of 
the retrieval index obtained in the retrieval index 
preparing unit 6, determining a plurality of impor- 
tance degrees of the unified particular hypertext 
documents according to the estimated values for 
each particular word, determining the ranking of the 
particular hypertext documents according to the 
importance degrees for the unified particular hyper- 
text documents for each particular word, preparing 
an index of one particular hypertext document for 
each of the particular hypertext documents and 
preparing a plurality of summaries of the particular 
hypertext documents for each of the particular 
words, and 

a retrieval result displaying unit 43 for displaying a 
group of the indexes of the particular hypertext doc- 
uments with the summaries of the particular hyper- 
text documents in the ranked order determined in 
the document ranking determining unit 42 for each 
particular word as a retrieval result. 

In the above configuration, after the indexes of the 
particular hypertext documents are prepared in the doc- 
ument ranking determining unit 42, a particular sen- 
tence or a particular phrase including one particular 
word is extracted from one particular hypertext docu- 
ment according to the positional information of the word 
information of the retrieval index prepared by the 
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retrieval index preparing unit 6, and a summary in which 
the particular sentence or the particular phrase is writ- 
ten in succession to a top sentence or a top phrase of 
the particular hypertext document is prepared for each 
of the particular words and each of the particular hyper- 5 
text documents. In cases where a plurality of particular 
sentences or a plurality of particular phrases including 
one particular word exist in one particular hypertext 
document, a summary in which the particular sentences 
or the particular phrases arranged in the existing order 10 
are written in succession to a top sentence or a top 
phrase of the particular hypertext document is pre- 
pared. Thereafter, the indexes of the particular hyper- 
text documents with the summaries of the particular 
hypertext documents are displayed for each particular 15 
word by the retrieval result displaying unit 43 in the 
ranked order determined in the document ranking deter- 
mining unit 42. 

Accordingly, because the summary of one particu- 
lar hypertext document is displayed for each of the par- 20 
ticular hypertext documents, the user can realize an 
outline of contents of each particular hypertext docu- 
ment by reading the summary of each particular hyper- 
text document without calling each particular hypertext 
document from the hypertext document managing unit 25 
8, the user can easily select one or more particular 
hypertext documents meeting a user's retrieval request. 

In this embodiment, even though a particular sen- 
tence or a particular phrase including one particular 
word appears many times in one particular hypertext 30 
document, all particular sentences or all particular 
phrases including the particular word are extracted from 
the particular hypertext document, and a summary is 
prepared. However, in cases where a summary of one 
particular hypertext document obtained by connecting a 35 
series of particular sentences or a series of particular 
phrases of the particular hypertext document with a top 
sentence or a top phrase of the particular hypertext doc- 
ument becomes too long, it is difficult for the user to 
quickly realize a long summary. Therefore, it is applica- 40 
We that three particular sentences or three particular 
phrases of the particular hypertext document be con- 
nected with a top sentence or a top phrase of the partic- 
ular hypertext document to prepare a summary for each 
particular word when the number of keywords input by 45 
the user is five or less, two particular sentences or two 
particular phrases of the particular hypertext document 
be connected with a top sentence or a top phrase of the 
particular hypertext document to prepare a summary for 
each particular word when the number of keywords so 
input by the user is ten or less, or one particular sen- 
tence or one particular phrase of the particular hyper- 
text document be connected with a top sentence or a 
top phrase of the particular hypertext document to pre- 
pare a summary for each particular word when the ss 
number of keywords input by the user is eleven or more. 
Therefore, it is prevented that the summary becomes 
too long, and the user can efficiently read a number of 
summaries displayed by the retrieval result displaying 



unit 43. 

Also, it is preferred that the concept of the second 
embodiment and the concept of the fifth embodiment be 
combined. For example, when a first group of the partic- 
ular hypertext document D83 and the parent documents 
D80 and D81 is set to the fourth rank, a second group of 
the particular hypertext document D85 and the parent 
documents D81 and D83 is set to the 18-th rank and a 
third group of the particular hypertext document D86 
and the parent documents D81 and D83 is set to the 1 9- 
th rank according to the second embodiment, as shown 
in Fig. 12, a summary of the particular hypertext docu- 
ment D83 is added to the first group, a summary of the 
particular hypertext document D85 is added to the sec- 
ond group and a summary of the particular hypertext 
document D86 is added to the third group. 

(Sixth Embodiment) 

In the world wide web, a composition (or an article) 
is divided into a number of portions, and each portion of 
the composition is written in one hypertext document. 
Therefore, there is a case that a context of the composi- 
tion is not sufficiently expressed in one portion of the 
composition written in one hypertext document. For 
example, though an apple grown in Aomori is described 
in the composition, the word "Aomori" indicating a pro- 
duction place of the apple is not written in the hypertext 
document D83 but is written in the parent document 
D81. 

Therefore, in cases where a plurality of keywords 
expressing a context of a composition are separately 
used in a hypertext document and a plurality of parent 
documents having a reference relationship with the 
hypertext document, the hypertext document is undesir- 
ably ranked to a lower class in the prior art. However, in 
the sixth embodiment, one combined hypertext docu- 
ment produced by combining a retrieval hypertext docu- 
ment (or a particular hypertext document) and one 
parent document having a reference relationship with 
the retrieval hypertext document is prepared for each of 
the parent documents, importance degrees of the com- 
bined hypertext documents are compared with each 
other, one combined hypertext document having the 
maximum importance degree is selected, and the max- 
imum importance degree is used as an importance 
degree for the retrieval hypertext document. 

Fig. 13 is a block diagram of a hypertext retrieving 
apparatus according to a sixth embodiment of the 
present invention. 

As shown in Fig. 13, a hypertext retrieving appara- 
tus 51 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request from a large vol- 
ume of hypertext documents stored in the hypertext 
document managing unit 8, comprises the hypertext 
document table with parent document list preparing unit 
7, the retrieval index preparing unit 6, the keyword input 
unit 2, the retrieving unit 3, 
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a document ranking determining unit 52 for combin- 
ing one particular hypertext document and one par- 
ticular parent document corresponding to the 
particular hypertext document to form a combined 
particular hypertext document according to the doc- 5 
ument information of the hypertext document table 
prepared by the hypertext document table with par- 
ent document list preparing unit 7 for each of the 
particular parent documents corresponding to the 
particular hypertext document and each of the par- w 
ticutar hypertext documents obtained in the retriev- 
ing unit 3, calculating estimated values for the 
combined particular hypertext documents accord- 
ing to the particular word information of the retrieval 
index obtained in the retrieval index preparing unit 6 15 
for each of the particular hypertext documents, 
determining a plurality of importance degrees of the 
combined particular hypertext documents accord- 
ing to the estimated values for each of the particular 
hypertext documents, comparing the importance 20 
degrees of the combined particular hypertext docu- 
ments with each other for each of the particular 
hypertext documents, selecting a maximum impor- 
tance degree among the importance degrees of the 
combined particular hypertext documents relating 2 s 
to one particular hypertext document for each of the 
particular hypertext documents, setting the maxi- 
mum importance degree to an importance degree 
for the particular hypertext document for each of the 
particular hypertext documents, determining the 30 
ranking of the particular hypertext documents 
according to those importance degrees and prepar- 
ing an index of one particular hypertext document 
for each of the particular hypertext documents, and 
a retrieval result displaying unit 53 for displaying a 35 
group of the indexes of the particular hypertext doc- 
uments with the summaries of the particular hyper- 
text documents in the ranked order determined in 
the document ranking determining unit 52 for each 
particular word as a retrieval result: 40 

In the above configuration, when a keyword "apple" 
and another keyword "Aomori" are input by the user on 
condition that a word "apple" appears in the hypertext 
document D83 and a word "Aomori" indicating an apple- 45 
producing prefecture does not appear in the hypertext 
document D83 or D82 but appear in the hypertext doc- 
ument D81 , because a particular word "apple" agreeing 
with the keyword "apple" appears in the hypertext docu- 
ment D83, the hypertext document D83 is set as a par- so 
ticular hypertext document in the retrieving unit 3. 

Thereafter, in the document ranking determining 
unit 52, the particular hypertext document D83 and the 
particular parent document D81 are combined to form a 
first combined particular hypertext document, the partic- ss 
ular hypertext document 083 and the particular parent 
document D82 are combined to form a second com- 
bined particular hypertext document, estimated values 
for the combined particular hypertext documents are 



calculated for each of the particular words, a first sum of 
the estimated value of the first combined particular 
hypertext document for the particular words and a sec- 
ond sum of the estimated value of the second combined 
particular hypertext document for the particular words 
are calculated. In this case, because the particular word 
"Aomori" does not appear in the hypertext document 
D82 but appear in the hypertext document D81 , the first 
sum of the estimated value of the first combined partic- 
ular hypertext document is higher than the second sum 
of the estimated value of the second combined particu- 
lar hypertext document. Therefore, the first combined 
particular hypertext document is selected, and the first 
sum of the estimated value of the first combined partic- 
ular hypertext document is set as an estimated value of 
the particular hypertext document D83 for the keywords 
"apple" and "Aomori", and an importance degree for the 
particular hypertext document D83 is calculated from 
the estimated value of the particular hypertext docu- 
ment D83. In the same manner, importance degrees for 
other particular hypertext documents are calculated, 
and the ranking of the particular hypertext documents is 
determined according to the importance degrees. 

Accordingly, even though a plurality of keywords 
expressing a context of a composition are separately 
used in a hypertext document and a plurality of parent 
documents having a reference relationship with the 
hypertext document, because a combined particular 
hypertext document obtained by combining one particu- 
lar hypertext document and one particular parent docu- 
ment is formed for each of the particular parent 
documents and a maximum estimated value of one 
combined particular hypertext document among those 
of the combined particular hypertext documents is set 
as an estimated value for the particular hypertext docu- 
ment, there is no probability that the particular hypertext 
document is undesirably ranked to a lower class. 

(Seventh Embodiment) 

A heading portion of a hypertext document nor- 
mally indicates a feature of the hypertext document very 
well. Therefore, to heavily estimate a particular word 
appearing in the heading portion of the hypertext docu- 
ment, an occurrence frequency of the particular word 
agreeing with one keyword in the heading portion of the 
hypertext document is doubled. As an example of the 
heading portion, a title of the hypertext document or an 
anchor sentence of a parent document having a refer- 
ence relationship with the hypertext document is con- 
sidered in this embodiment. 

Fig. 14 is a block diagram of a hypertext retrieving 
apparatus according to a seventh embodiment of the 
present invention. 

As shown in Fig. 14, a hypertext retrieving appara- 
tus 61 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request from a large vol- 
ume of hypertext documents stored in the hypertext 
document managing unit 8, comprises the hypertext 
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document table with parent document list preparing unit 
7, the retrieval index preparing unit 6, the keyword input 
unit 2, the retrieving unit 3, 

a document ranking determining unit 62 for unifying 5 
one particular hypertext document and one or more 
particular parent documents corresponding to the 
particular hypertext document to a unified particular 
hypertext document according to the document 
information of the hypertext document table pre- 10 
pared by the hypertext document table with parent 
document list preparing unit 7 for each of the partic- 
ular hypertext documents obtained in the retrieving 
unit 3, calculating an occurrence frequency TF of 
one particular word in one unified particular hyper- 75 
text document for each particular word and each 
unified particular hypertext document on condition 
that an occurrence frequency of the particular word 
appearing in an heading portion of the unified par- 
ticular hypertext document is doubled, calculating 20 
an inverse document frequency IDF defined as an 
inverse value of the number of particular hypertext 
documents, in which one particular word appears, 
for each particular word, calculating a product 
TF*IDF of one occurrence frequency TF and one 25 
inverse document frequency IDF, summing a plural- 
ity of products for all particular words to produce a 
summed product as an estimated value for each 
particular hypertext document, determining a plu- 
rality of importance degrees of the unified particular 30 
hypertext documents according to the estimated 
values, determining the ranking of the particular 
hypertext documents according to the importance 
degrees for the unified particular hypertext docu- 
ments and preparing an index of one particular 35 
hypertext document for each of the particular 
hypertext documents, and 
a retrieval result displaying unit 63 for displaying the 
indexes of the particular hypertext documents in the 
ranked order determined in the document ranking 40 
determining unit 62 as a retrieval result. 

In the above configuration, a heading portion of 
each unified particular hypertext document is com- 
posed of a title of one particular hypertext document 45 
corresponding to the unified particular hypertext docu- 
ment and one or more anchor sentences of particular 
parent documents having a reference relationship with 
the particular hypertext document. For example, in 
cases where a particular word agreeing with one key- 50 
word appears six times in one unified particular hyper- 
text document on condition that the particular word 
appears three times in the heading portion of the unified 
particular hypertext document, the particular word 
appearing in the heading portion of the unified particular 55 
hypertext document is 

double-counted each time the particular word appears, 
so that an occurrence frequency TF of the particular 
word in the unified particular hypertext document is 



equal to 9. Thereafter, one particular hypertext docu- 
ment corresponding to the unified particular hypertext 
document is ranked according to the occurrence fre- 
quency TF=9. 

Accordingly, because the heading portion of the 
hypertext document normally indicates a feature of the 
hypertext document very well and the particular word 
appearing in the heading portion of the unified particular 
hypertext document is 

double-counted, reliability for the ranking of the particu- 
lar hypertext documents can be moreover heightened. 

In an HTML hypertext document written by the 
hypertext mark-up language, a small index is expressed 
by a character string surrounded by "<h1 r and "(/hi )*. 
Therefore, it is applicable that the small index be 
included in the heading portion of the HTML hypertext 
document. 

In this embodiment, the occurrence frequency of 
the particular word appearing in the heading portion of 
the unified particular hypertext document is doubled. 
However, it is applicable that the occurrence frequency 
of the particular word be increased three or more times. 

(Eighth Embodiment) 

In the hypertext documents of the world wide web, 
there is a special hypertext document in which a 
number of anchor sentences exist and any other sen- 
tences do not exist. This special hypertext document is 
generally called a link page. Even though the link page 
is retrieved and displayed, any useful information meet- 
ing a user's retrieval intention does not exist in the link 
page. Therefore, an occurrence number of a particular 
word in the link page is lowered to zero in this embodi- 
ment. 

Fig. 15 is a block diagram of a hypertext retrieving 
apparatus according to an eighth embodiment of the 
present invention. 

As shown in Fig. 15, a hypertext retrieving appara- 
tus 71 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request from a large vol- 
ume of hypertext documents stored in the hypertext 
document managing unit 8, comprises the hypertext 
document table with parent document list preparing unit 
7, the retrieval index preparing unit 6, the keyword input 
unit 2, the retrieving unit 3, 

a document ranking determining unit 72 for unifying 
one particular hypertext document and one or more 
particular parent documents corresponding to the 
particular hypertext document to a unified particular 
hypertext document according to the document 
information of the hypertext document table pre- 
pared by the hypertext document table with parent 
document list preparing unit 7 for each of the partic- 
ular hypertext documents obtained in the retrieving 
unit 3, specifying a link page from among the partic- 
ular hypertext documents, calculating an occur- 
rence frequency TF of one particular word in one 
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unified particular hypertext document for each par- 
ticular word and each unified particular hypertext 
document on condition that an occurrence fre- 
quency of the particular word in the link page is 
reduced by one each time the particular word is 5 
found out in the link page treated as one particular 
parent document of the unified particular hypertext 
document, calculating an inverse document fre- 
quency IDF defined as an inverse value of the 
number of particular hypertext documents, in which 10 
one particular word appears, for each particular 
word, calculating a product TF*IDF of one occur- 
rence frequency TF and one inverse document fre- 
quency IDF, summing a plurality of products for all 
particular words to produce a summed product as 15 
an estimated value for each particular hypertext 
document, determining a plurality of importance 
degrees of the unified particular hypertext docu- 
ments according to the estimated values, determin- 
ing the ranking of the particular hypertext 20 
documents according to the importance degrees for 
the unified particular hypertext documents and pre- 
paring an index of one particular hypertext docu- 
ment for each of the particular hypertext 
documents, and 25 
a retrieval result displaying unit 73 for displaying the 
indexes of the particular hypertext documents in the 
ranked order determined in the document ranking 
determining unit 62 as a retrieval result. 

30 

In the above configuration, the hypertext document 
D82 is, for example, a link page relating to the particular 
word "apple" and is composed often anchor sentences. 
Therefore, ten reference documents respectively having 
a reference relationship with the hypertext document 35 
D82 exist. When an occurrence frequency of the partic- 
ular word "apple" in a unified particular hypertext docu- 
ment composed of one reference document treated as 
one particular hypertext document and the hypertext 
document D82 treated as one particular parent docu- 40 
ment is calculated, an occurrence frequency of the par- 
ticular word "apple" in the hypertext document D82 
treated as one particular hypertext document is reduced 
by one each time the particular word "apple" is found 
out in the particular parent document D82. This reduc- 45 
ing operation is performed for all reference documents 
treated as the particular hypertext documents. 

Therefore, even though the particular word "apple" 
appears in the hypertext document D82 many times, the 
occurrence frequency of the particular word "apple" in so 
the hypertext document D82 is necessarily reduced to 
zero, and the hypertext document D82 is ranked to the 
lowest class. 

Accordingly, any particular hypertext document 
functioning as one link page can be always ranked to 55 
the lowest class. 



(Ninth Embodiment) 

There is a long hypertext document composed of a 
plurality of blocks respectively corresponding to a 
meaning, and a reference label is arranged in the top of 
each block of the long hypertext document. In this 
embodiment, the long hypertext document is divided 
into the plurality of blocks, and a hypertext document 
table corresponding to each block of the long hypertext 
document is prepared. 

Fig. 16 is a block diagram of a hypertext retrieving 
apparatus according to a ninth embodiment of the 
present invention. 

As shown in Fig. 16, a hypertext retrieving appara- 
tus 76 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request from a large vol- 
ume of hypertext documents stored in the hypertext 
document managing unit 8, comprises 

a hypertext document table with parent document 
list preparing unit 77 for analyzing the hypertext 
documents having the reference relationships 
which are managed by the hypertext document 
managing unit 8, specifying a long hypertext docu- 
ment composed of a plurality of blocks respectively 
corresponding to a meaning, setting each block of 
the long hypertext document as one hypertext doc- 
ument corresponding to one meaning, preparing 
hypertext document information in which one or 
more parent document identifiers identifying one or 
more parent documents and anchor sentences of 
the parent documents are listed with one hypertext 
document identifier identifying one hypertext docu- 
ment and a document storing position of the hyper- 
text document, for each of the hypertext 
documents, and preparing a hypertext document 
table of the hypertext document information for ail 
hypertext documents managed by the hypertext 
document managing unit 8, 
the retrieval index preparing unit 6, the keyword 
input unit 2, the retrieving unit 3, the document 
ranking determining unit 4 and the retrieval result 
displaying unit 73. 

In the above configuration, as shown in Fig. 17, in 
cases where a long hypertext document D87 composed 
of a plurality of blocks respectively corresponding to a 
meaning exists in the hypertext documents managed by 
the hypertext document managing unit 8, the long 
hypertext document D87 is specified by the hypertext 
document table with parent document list preparing unit 
77, and one or more reference labels respectively 
arranged on the top of one block of the long hypertext 
document D87 are found out Thereafter, the long 
hypertext document D87 is divided into the plurality of 
blocks, and each block of the long hypertext document 
D87 is set as one hypertext document D87, D88 or D89. 
In this case, when the user reads a character string 
"ABC" or "XY2" of an anchor sentence of one hypertext 
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document, the user can immediately refer to the refer- 
ence label such as "#ABC" or "#XYZ" of another hyper- 
text document. Thereafter, a hypertext document table 
of the hypertext document information for all hypertext 
documents is prepared in the same manner as in the $ 
first embodiment. 

Accordingly, even though a long hypertext docu- 
ment composed of a plurality of blocks respectively cor- 
responding to a meaning exists in the hypertext 
documents, because the long hypertext document is w 
divided into the blocks and each block of the long hyper- 
text document is set as one hypertext document to pre- 
pare the hypertext document information for each block 
of the long hypertext document, the hypertext docu- 
ments respectively relating to one meaning can be is 
ranked, so that the user can easily retrieve a group of 
hypertext documents likely to meet his request. 

In this embodiment, in cases where a small index 
expressed by a character string surrounded by "<h1 >" 
and "(/hi ) " is used in a long hypertext document, it is 20 
applicable that the long hypertext document be divided 
into a plurality of blocks on condition that one reference 
label or one small index is arranged on the top of each 
block 

(Tenth Embodiment) 

In cases where the user intends to again retrieve a 
plurality of hypertext documents by changing an initial 
keyword to another keyword which relates to a plurality 
of particular hypertext documents displayed according 
to the initial keyword, the user generally desires to 
acknowledge one or more words frequently appearing 
in the particular hypertext documents. Therefore, in this 
embodiment, one or more words frequently appearing 
in the particular hypertext documents are displayed. 

Fig. 18 is a block diagram of a hypertext retrieving 
apparatus according to a tenth embodiment of the 
present invention. 

As shown in Fig. 18, a hypertext retrieving appara- 
tus 91 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request from a large vol- 
ume of hypertext documents stored in the hypertext 
document managing unit 8, comprises 

the hypertext document table with parent document 
list preparing unit 7, the retrieval index preparing 
unit 6, the keyword input unit 2, the retrieving unit 3, 
a document ranking determining unit 92 for unifying 
one particular hypertext document and one or more 
particular parent documents corresponding to the 
particular hypertext document to a unified particular 
hypertext document according to the document 
information of the hypertext document table pre- 
pared by the hypertext document table with parent 
document list preparing unit 7 for each of the partic- 
ular hypertext documents obtained in the retrieving 
unit 3, calculating an occurrence frequency TF of 
one particular word in one unified particular hyper- 



text document for each particular word and each 
unified particular hypertext document, calculating 
an inverse document frequency IDF defined as an 
inverse value of the number of particular hypertext 
documents, in which one particular word appears, 
for each particular word, calculating a product 
TFMDF of one occurrence frequency TF and one 
inverse document frequency IDF, summing a plural- 
ity of products for all particular words to produce a 
summed product as an estimated value for each 
particular hypertext document, determining a plu- 
rality of importance degrees of the unified particular 
hypertext documents according to the estimated 
values, determining the ranking of the particular 
hypertext documents according to the importance 
degrees for the unified particular hypertext docu- 
ments, preparing an index of one particular hyper- 
text document for each of the particular hypertext 
documents, selecting a plurality of high-ranking 
hypertext documents from the particular hypertext 
documents, extracting a plurality of related words 
listed in a plurality of word lists of pieces of hyper- 
text document information of the hypertext docu- 
ment table corresponding to the high-ranking 
hypertext documents, calculating an occurrence 
frequency TF of one related word in one high-rank- 
ing hypertext document for each related word and 
each high-ranking hypertext document, calculating 
an inverse document frequency IDF defined as an 
inverse value of the number of high-ranking hyper- 
text documents, in which one related word appears, 
for each related word, calculating a sum of a plural- 
ity of products TF*IDF for all high-ranking hypertext 
documents to produce a summed product as an 
importance degree for each related word, compar- 
ing the importance degrees of the related words 
with each other, selecting a plurality of high-ranking 
related words of which the importance degrees are 
higher than those of other related words, and pre- 
paring a hypertext mark-up language (HTML) docu- 
ment in which a plurality of keyword selection 
buttons corresponding to the high-ranking related 
words are arranged in the decreasing order of the 
importance degrees of the high-ranking related 
words to select one high-ranking related word by 
pushing one keyword selection button, and 
a retrieval result displaying unit 93 for displaying the 
indexes of the particular hypertext documents in the 
ranked order determined in the document ranking 
determining unit 92 as a retrieval result on a result 
displaying window W1 and displaying the HTML 
document prepared by the document ranking deter- 
mining unit 92 on a high-ranking related word 
selecting window W2. 

In the above configuration, in cases where the tenth 
embodiment and the third embodiment are combined, 
as shown in Fig. 19, when a keyword "apple" is input to 
the keyword input unit 2, a plurality of indexes of partic- 
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ular hypertext documents such as documents D83, D85 
and D86 and a plurality of indexes of parent documents 
such as documents D80 and D81 are, for example, dis- 
played on the result displaying window W1 in the same 
manner as in the third embodiment. Thereafter, in the 
document ranking determining unit 92, ten high-ranking 
hypertext documents are selected from the particular 
hypertext documents, a plurality of related words listed 
in a plurality of word lists of pieces of hypertext docu- 
ment information of the hypertext document table corre- 
sponding to the high-ranking hypertext documents are 
extracted, a sum of a plurality of products TF*IDF for all 
high-ranking hypertext documents is calculated for each 
related word, and importance degrees for the related 
words are determined. Thereafter, ten high-ranking 
related words "Shinshu", "farmer", "product", "Aomori", 
"manure", "farm", "festival", "Nebuta", "Nagano" and 
"Olympics" are selected from the related words, an 
HTML document in which ten keyword selection buttons 
corresponding to the high-ranking related words are 
arranged in the decreasing order of the importance 
degrees of the high-ranking related words is prepared, 
and the HTML document is displayed on the high-rank- 
ing related word selecting window W2. 

Therefore, when the user push the keyword button 
corresponding to the high-ranking related word "Shin- 
shu", the word "Shinshu" indicating an apple-producing 
district is input to the keyword input unit 2 as a keyword, 
importance degrees of a plurality of particular hypertext 
documents corresponding to the keyword "Shinshu" are 
determined, and the particular hypertext documents 
arranged in the decreasing order of the importance 
degrees are displayed on the result displaying window 
W1 in the same manner as in the first embodiment. 

Accordingly, even though the user cannot initially 
bring an appropriate keyword to his mind, the user can 
select one or more keywords closer to his retrieval 
intention. Also, the user can change his retrieval inten- 
tion by referring to the high-ranking related words, and a 
plurality of particular hypertext documents correspond- 
ing to a new keyword selected by the user according to 
his new retrieval intention can be displayed. 

In this case, the user can push the keyword selec- 
tion button by using a pointing device without using a 
keyboard. Also, the keyword selection buttons are 
embodied by operating a JAVA script in which the high- 
ranking related words are added to a text box, a "clear" 
button is embodied by operating a JAVA script in which 
one high-ranking related word added to the text box is 
cleared, an "initial condition" button is embodied by 
operating a JAVA script in which the high-ranking 
related words added to the text box are returned to an 
initial group of keywords such as "apple", and an "re- 
retrieval" button is embodied by operating a JAVA script 
in which a retrieval operation is again operated by using 
one or more words added to the text box as one or more 
keywords. 

In this embodiment, the high-ranking hypertext doc- 
uments are selected from the particular hypertext docu- 



ments. However, it is applicable that the high-ranking 
hypertext documents be selected from the particular 
hypertext documents and the parent documents. In this 
case, a plurality of related words can be widely collected 
5 from a plurality of hypertext documents having a refer- 
ence relationship with each other. 

(Eleventh Embodiment) 

w In the tenth embodiment, the importance degrees 
of the related words are determined without any con- 
nection with the keyword initially input by the user. How- 
ever, in cases where the user desires to select related 
word having a close correlation with the keyword, it is 
is preferred that a related word having a close correlation 
with a keyword be preferentially selected as a high- 
ranking related word. Therefore, in this embodiment, an 
occurrence frequency of a related word having a close 
correlation with a keyword is doubled to heighten an 
importance degree of the related word. 

Fig. 20 is a block diagram of a hypertext retrieving 
apparatus according to an eleventh embodiment of the 
present invention. 

As shown in Fig. 20, a hypertext retrieving appara- 
tus 1 01 for retrieving one or more hypertext documents 
likely to meet a user's retrieval request from a large vol- 
ume of hypertext documents stored in the hypertext 
document managing unit 8, comprises 

the hypertext document table with parent document 
list preparing unit 7, the retrieval index preparing 
unit 6, the keyword input unit 2, the retrieving unit 3, 
a document ranking determining unit 102 for unify- 
ing one particular hypertext document and one or 
more particular parent documents corresponding to 
the particular hypertext document to a unified par- 
ticular hypertext document according to the docu- 
ment information of the hypertext document table 
prepared by the hypertext document table with par- 
ent document list preparing unit 7 for each of the 
particular hypertext documents obtained in the 
retrieving unit 3, calculating an occurrence fre- 
quency TF of one particular word in one unified par- 
ticular hypertext document for each particular word 
and each unified particular hypertext document, 
calculating an inverse document frequency IDF 
defined as an inverse value of the number of partic- 
ular hypertext documents, in which one particular 
word appears, for each particular word, calculating 
a product TF*IDF of one occurrence frequency TF 
and one inverse document frequency IDF, summing 
a plurality of products for all particular words to pro- 
duce a summed product as an estimated value for 
each particular hypertext document, determining a 
plurality of importance degrees of the unified partic- 
ular hypertext documents according to the esti- 
mated values, determining the ranking of the 
particular hypertext documents according to the 
importance degrees for the unified particular hyper- 
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text documents, preparing an index of one particu- 
lar hypertext document for each of the particular 
hypertext documents, selecting a plurality of high- 
ranking hypertext documents from the particular 
hypertext documents, extracting a plurality of 5 
related words listed in a plurality of word lists of 
pieces of hypertext document information of the 
hypertext document table corresponding to the 
high-ranking hypertext documents, calculating an 
occurrence frequency TF of one related word in one w 
high-ranking hypertext document for each related 
word and each high-ranking hypertext document on 
condition that the related word is double-counted 
when the related word is placed within a distance of 
40 letters from one keyword, calculating an inverse 75 
document frequency IDF defined as an inverse 
value of the number of high-ranking hypertext doc- 
uments, in which one related word appears, for 
each related word, calculating a sum of a plurality of 
products TF*IDF for all high-ranking hypertext doc- 20 
uments to produce a summed product as an impor- 
tance degree for each related word, comparing the 
importance degrees of the related words with each 
other, selecting a plurality of high-ranking related 
words of which the importance degrees are higher 2 s 
than those of other related words, and preparing a 
hypertext mark-up language (HTML) document in 
which a plurality of keyword selection buttons corre- 
sponding to the high-ranking related words are 
arranged in the decreasing order of the importance 30 
degrees of the high-ranking related words to select 
one high-ranking related word by pushing one key- 
word selection button, and 
a retrieval result displaying unit 103 for displaying 
the indexes of the particular hypertext documents in 35 
the ranked order determined in the document rank- 
ing determining unit 92 as a retrieval result on a 
result displaying window W1 and displaying the 
HTML document prepared by the document rank- 
ing determining unit 102 on a high-ranking related 40 
word selecting window W2. 

In the above configuration, after the related words 
are extracted in the same manner as in the tenth 
embodiment, an occurrence frequency TF of one 45 
related word in one high-ranking hypertext document is 
calculated for each related word and each high-ranking 
hypertext document. In this case, when the related word 
is placed within a distance of 40 letters from one key- 
word "apple", the related word is double-counted, so 
Therefore, because the related word "Shinshu" indicat- 
ing an apple-producing district or the related word 
"farmer" often appears within a distance of 40 letters 
from one keyword "apple" and because the related word 
"Nagano" indicating an apple-producing prefecture or 55 
the related word "Olympics" indicating a festival held in 
the Nagano in 1998 is hardly appears within a distance 
of 40 letters from one keyword "apple", as shown in Fig. 
21, the related words "Shinshu" and "farmer" are relia- 



bly displayed on the head portion of the high-ranking 
related word selecting window W2, and the related 
words "Nagano" and "Olympics" are displayed on the 
rear portion of the high-ranking related word selecting 
window W2 even though the related words "Nagano" 
and "Olympics" frequently appear in the particular 
hypertext documents. 

Accordingly, one or more related words having a 
strong relationship with the keyword can be displayed in 
high-ranking positions, and one or more related words 
corresponding to a user's retrieval intention differing 
from the initial retrieval intention can be displayed in 
low-ranking positions. 

Having illustrated and described the principles of 
the present invention in a preferred embodiment 
thereof, it should be readily apparent to those skilled in 
the art that the invention can be modified in arrange- 
ment and detail without departing from such principles. 
We claim all modifications coming within the scope of 
the accompanying claims. 

Claims 

1. A hypertext document retrieving apparatus for 
retrieving a plurality of particular hypertext docu- 
ments likely to meet a user's retrieval request from 
a group of hypertext documents having reference 
relationships with each other in which one hypertext 
document having an anchor sentence functions as 
a parent document for another hypertext document 
functioning as a reference document and a user 
refers to one reference document after the user 
selects one anchor sentence of one parent docu- 
ment corresponding to the reference document, 
comprising: 

hypertext document table preparing means for 
preparing hypertext document information, in 
which one hypertext document identifier identi- 
fying one hypertext document, a body of the 
hypertext document, a parent document identi- 
fier identifying a parent document correspond- 
ing to the hypertext document functioning as 
one reference document and an anchor sen- 
tence of the parent document are registered, 
for each of the hypertext documents and pre- 
paring a hypertext document table of the hyper- 
text document information for the hypertext 
documents; 

retrieval index preparing means for recognizing 
a plurality of words appearing in each of the 
hypertext documents and the parent docu- 
ments according to the hypertext document 
table prepared by the hypertext document table 
preparing means, recognizing a plurality of 
occurrence positions of the words in each of 
the hypertext documents and the parent docu- 
ments according to the hypertext document 
table, preparing word information, composed of 
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one or more occurrence document identifiers 
identifying one or more hypertext documents in 
which one word appears and occurrence posi- 
tions of the word in the hypertext documents 
and one or more anchor sentences of one or 5 
more parent documents corresponding to the 
hypertext documents, for each of the words, 
and preparing a retrieval index of pieces of 
word information for the words; 
keyword receiving means for receiving a key- io 
word indicating the user's retrieval request; 
retrieving means for retrieving particular word 
information corresponding to the keyword 
received by the keyword receiving means from 
the retrieval index prepared by the retrieval is 
index preparing means and retrieving a plural- 
ity of particular occurrence document identifi- 
ers identifying a plurality of particular hypertext 
documents in which the keyword appears and 
a plurality of particular occurrence positions of 20 
the keyword in the particular hypertext docu- 
ments and one or more particular anchor sen- 
tences of one or more particular parent 
documents corresponding to the particular 
hypertext documents from the particular word 25 
information; 

document ranking determining means for spec- 
ifying the particular hypertext documents which 
are identified by the particular occurrence doc- 
ument identifiers retrieved by the retrieving 30 
means, retrieving pieces of particular hypertext 
document information for the particular hyper- 
text documents from the hypertext document 
table prepared by the hypertext document table 
preparing means, unifying one particular 35 
hypertext document and one or more particular 
parent documents corresponding to the partic- 
ular hypertext document to a unified hypertext 
document for each of the particular hypertext 
documents, calculating an occurrence fre- 40 
quency of the keyword in one unified hypertext 
document for each unified hypertext document, 
determining a plurality of importance degrees 
of the unified hypertext documents according 
to the occurrence frequencies in the unified 45 
hypertext documents, setting one importance 
degree of one unified hypertext document as 
an importance degree of one particular hyper- 
text document corresponding to the unified 
hypertext document for each unified hypertext so 
document and determining the ranking of the 
particular hypertext documents according to 
the importance degrees of the particular hyper- 
text documents; and 

retrieval result displaying means for displaying 55 
a plurality of indexes of the particular hypertext 
documents in a ranked order corresponding to 
the ranking of the particular hypertext docu- 
ments determined by the document ranking 



determining means as a retrieval result. 

2. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which an index of one particular 
parent document corresponding to one particular 
hypertext document is displayed with the index of 
the particular hypertext document by the retrieval 
result displaying means for each of the particular 
hypertext documents. 

3. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which a plurality of particular hyper- 
text documents corresponding to the same 
particular parent document are reset to the same 
rank as a highest rank among the ranks determined 
for the particular hypertext documents by the docu- 
ment ranking determining means, and the particu- 
lar hypertext documents set to the same rank are 
displayed with the particular parent document in a 
group by the retrieval result displaying means. 

4. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which a plurality of particular hyper- 
text documents corresponding to the same 
particular parent document are reset to a same 
rank according to a sum of the importance degrees 
for the particular hypertext documents by the docu- 
ment ranking determining means, and the particu- 
lar hypertext documents set to the same rank are 
displayed with the particular parent document in a 
group by the retrieval result displaying means. 

5. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which each of the unified hypertext 
documents is formed by the document ranking 
determining means by unifying one or more anchor 
sentences of one or more particular parent docu- 
ments corresponding to one particular hypertext 
document and the particular hypertext document. 

6. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which a particular sentence or a 
particular phrase including the keyword is extracted 
from each of the particular hypertext documents by 
the document ranking determining means, and a 
summary in which one particular sentence or one 
particular phrase of one particular hypertext docu- 
ment is written in succession to a top sentence or a 
top phrase of the particular hypertext document is 
displayed with the index of the particular hypertext 
document for each of the particular hypertext docu- 
ments. 

7. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which the importance degree of 
each of the unified hypertext documents is deter- 
mined by the document ranking determining means 
by calculating a sum of an occurrence frequency of 
the keyword in one hypertext document and an 
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occurrence frequency of the keyword in one parent 
document corresponding to the hypertext docu- 
ment for each of the parent documents correspond- 
ing to the hypertext document, selecting a 
maximum sum among the sums for the parent doc- 5 
uments, specifying one particular parent document 
corresponding to the maximum sum, determining 
one importance degree for a combination of the 
hypertext document and the particular parent docu- 
ment according to the maximum sum and regarding 10 
the importance degree as one importance degree 
of one unified hypertext document corresponding to 
the hypertext document. 

8. A hypertext document retrieving apparatus accord- 15 
ing to claim 1 in which the occurrence frequency of 
the keyword in each unified hypertext document is 
calculated by the document ranking determining 
means by double-counting the keyword appearing 

in one or more anchor sentences of one or more 20 
particular parent documents corresponding to the 
unified hypertext document. 

9. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which the occurrence frequency of 25 
the keyword in one hypertext document functioning 

as a link page composed of one or more anchor 
sentences is set to zero by the document ranking 
determining means. 

30 

10. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which one hypertext document hav- 
ing contents corresponding to a plurality of 
meanings respectively identified by a reference 
label is divided into a plurality of blocks by the 35 
hypertext document table preparing means to 
include one reference label in a top of each block, 
and one hypertext document information is pre- 
pared for each block of the hypertext document by 

the hypertext document table preparing means. 40 

11. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which a predetermined number of 
high-ranking particular hypertext documents are 
selected from among the particular hypertext docu- 45 
ments by the document ranking determining 
means, a plurality of related words appearing in the 
high-ranking particular hypertext documents are 
extracted from the high-ranking particular hypertext 
documents by the document ranking determining 50 
means, a plurality of importance degrees of the 
related words are calculated from a plurality of 
occurrence frequencies of the related words in the 
high-ranking particular hypertext documents by the 
document ranking determining means, a predeter- 55 
mined number of high-ranking related words are 
selected from the related words ranked according 

to the importance degrees of the related words by 
the document ranking determining means, and a 



plurality of selection buttons for the high-ranking 
related words are displayed with the indexes of the 
particular hypertext documents by the retrieval 
result displaying means. 

12. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which a predetermined number of 
high-ranking particular hypertext documents are 
selected from among the particular hypertext docu- 
ments by the document ranking determining 
means, a plurality of related words appearing in the 
high-ranking particular hypertext documents and a 
plurality of particular parent documents corre- 
sponding to the high-ranking particular hypertext 
documents are extracted from the high-ranking par- 
ticular hypertext documents by the document rank- 
ing determining means, a plurality of importance 
degrees of the related words are calculated from a 
plurality of occurrence frequencies of the related 
words in the high-ranking particular hypertext docu- 
ments and the particular parent documents by the 
document ranking determining means, a predeter- 
mined number of high-ranking related words are 
selected from the related words ranked according 
to the importance degrees of the related words by 
the document ranking determining means, and a 
plurality of selection buttons for the high-ranking 
related words are displayed with the indexes of the 
particular hypertext documents by the retrieval 
result displaying means. 

13. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which a predetermined number of 
high-ranking particular hypertext documents are 
selected from among the particular hypertext docu- 
ments by the document ranking determining 
means, a plurality of related words appearing in the 
high-ranking particular hypertext documents are 
extracted from the high-ranking particular hypertext 
documents by the document ranking determining 
means, an occurrence frequency of each related 
word in the high-ranking particular hypertext docu- 
ments is calculated by the document ranking deter- 
mining means on condition that the related word 
appearing in one high-ranking particular hypertext 
document is double-counted in cases where an 
occurrence position of the related word is near to an 
occurrence position of the keyword, a plurality of 
importance degrees of the related words are calcu- 
lated from the occurrence frequencies of the related 
words by the document ranking determining 
means, a predetermined number of high-ranking 
related words are selected from the related words 
ranked according to the importance degrees of the 
related words by the document ranking determining 
means, and a plurality of selection buttons for the 
high-ranking related words are displayed with the 
indexes of the particular hypertext documents by 
the retrieval result displaying means. 
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14. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which a predetermined number of 
high-ranking particular hypertext documents are 
selected from among the particular hypertext docu- 
ments by the document ranking determining 5 
means, a plurality of related words appearing in the 
high-ranking particular hypertext documents and a 
plurality of particular parent documents corre- 
sponding to the high-ranking particular hypertext 
documents are extracted from the high-ranking par- 10 
ticular hypertext documents by the document rank- 
ing determining means, an occurrence frequency of 
each related word in the high-ranking particular 
hypertext documents and the particular parent doc- 
uments is calculated by the document ranking 15 
determining means on condition that the related 
word appearing in one high-ranking particular 
hypertext document or one particular parent docu- 
ment is double-counted in cases where an occur- 
rence position of the related word is near to an 20 
occurrence position of the keyword, a plurality of 
importance degrees of the related words are calcu- 
lated from the occurrence frequencies of the related 
words by the document ranking determining 
means, a predetermined number of high-ranking 2 s 
related words are selected from the related words 
ranked according to the importance degrees of the 
related words by the document ranking determining 
means, and a plurality of selection buttons for the 
high-ranking related words are displayed with the 30 
indexes of the particular hypertext documents by 

the retrieval result displaying means. 

15. A hypertext document retrieving apparatus accord- 
ing to claim 1 in which a plurality of keywords are 35 
received by the keyword receiving means, an 
occurrence frequency TF of one keyword in one 
unified hypertext document is calculated by the 
document ranking determining means for each key- 
word and each unified hypertext document, an 40 
inverse document frequency IDF defined as an 
inverse value of the number of particular hypertext 
documents in which one keyword appears is calcu- 
lated by the document ranking determining means 

for each keyword, a product TFMDF of one occur- 45 
rence frequency TF and one inverse document fre- 
quency IDF is calculated by the document ranking 
determining means, a plurality of products for the 
keywords are summed by the document ranking 
determining means to produce a summed product so 
as an estimated value for each unified particular 
hypertext document, and the importance degrees 
of the unified hypertext documents are determined 
according to the estimated values by the document 
ranking determining means. 55 

16. A hypertext document retrieving apparatus accord- 
ing to claim 1 5 in which one estimated vaJue for one 
unified particular hypertext document is increased 



to heighten the rank of the particular hypertext doc- 
ument in cases where two or more keywords 
appear in the unified particular hypertext document 
or a distance of two keywords in the unified particu- 
lar hypertext document is within a predetermined 
number of words. 
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